{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Extract MFCC features\n",
    "This notebook shows how to use pykaldi to extract MFCC features from a wav file.\n",
    "It also serves as an example for numpy and scikit-learn integration."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We begin by importing all the necesary components from pykaldi, numpy and sklearn. If you haven't done so, you'll need to install them in your system. You can install numpy and scikit using __pip__ (i.e., `pip install numpy scikit-learn`). For installation of pykaldi, please follow the [instructions](https://github.com/pykaldi/pykaldi#installation)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from kaldi.feat.mfcc import Mfcc, MfccOptions\n",
    "from kaldi.matrix import SubVector, SubMatrix\n",
    "from kaldi.util.options import ParseOptions\n",
    "from kaldi.util.table import SequentialWaveReader\n",
    "from kaldi.util.table import MatrixWriter\n",
    "from numpy import mean\n",
    "from sklearn.preprocessing import scale"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We create a test scp file with a dummy key and the location of our test wav file. Make sure the location of the test file matches with the location of your pykaldi installation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "with open(\"testfile.scp\", \"w\") as outpt:\n",
    "    outpt.write(\"TEST /pykaldi/tools/kaldi/src/feat/test_data/test.wav\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "PyKaldi option parsing API is slightly different from the underlying Kaldi option parsing API. Command-line options for the main script are registered by calling type-specific registration methods that accept name, default value and help string arguments e.g. \\lstinline{min-duration} in the example. The `parse_args` method of a PyKaldi `ParseOptions` instance returns a simple namespace object containing the parsed option values for the main script. Parsed values for other options are directly written into the appropriate fields of associated options instances, e.g. `mfcc_opts` in the example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "usage = \"\"\"Extract MFCC features.\n",
    "           Usage:  example.py [opts...] <rspec> <wspec>\n",
    "        \"\"\"\n",
    "\n",
    "po = ParseOptions(usage)\n",
    "po.register_float(\"min-duration\", 0.0,\n",
    "                  \"minimum segment duration\")\n",
    "mfcc_opts = MfccOptions()\n",
    "mfcc_opts.frame_opts.samp_freq = 8000\n",
    "mfcc_opts.register(po)\n",
    "\n",
    "opts = po.parse_args()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In typical Kaldi fashion, input/output tables are constructed with read/write specifiers, strings that describe how the data should be read/written."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "rspec, wspec = \"scp:testfile.scp\", \"ark,t:test_mfcc.ark\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Kaldi objects are first class objects in PyKaldi. This allows them to be created, instanciated and passed as arguments to other objects or to functions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create MFCC object and obtain sample frequency\n",
    "mfcc = Mfcc(mfcc_opts)\n",
    "sf = mfcc_opts.frame_opts.samp_freq"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "PyKaldi table readers/writers implement the context manager interface, hence they do not need to be closed when they are used in a `with` statement. PyKaldi table writers also support a pseudo-dictionary interface for writing given key value pairs. Since PyKaldi matrices implement NumPy array interface, they can be passed to functions expecting Numpy array arguments, such as `mean` and `scale`, without explicit conversion. The NumPy arrays returned from functions can be easily converted back to Kaldi vector and matrix types by constructing new `SubVector` and `SubMatrix` objects which share the underlying memory buffers with the source arrays whenever possible, i.e. no data is copied unless necessary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      ">>> print(wav.sample_freq)\n",
      "16000.0\n",
      "\n",
      ">>> print(s)\n",
      "\n",
      " 11891  28260      0  ...     356    360    362\n",
      "[kaldi.matrix.Matrix of size 1x23001]\n",
      "\n",
      "\n",
      ">>> print(f)\n",
      "\n",
      " 0.9177 -0.6260 -0.0099  ...  -0.8744  0.4648  0.8280\n",
      "-1.4267  0.5150 -1.0192  ...  -2.6820  0.5182  1.2632\n",
      "-1.2150  0.7154 -0.8618  ...  -0.4959  0.6084  0.7360\n",
      "          ...             ⋱             ...          \n",
      "-1.6294  0.1008  0.5520  ...  -1.3087  1.6499  1.2011\n",
      "-1.6358 -0.6769 -0.0806  ...  -1.3102  0.1379 -0.0234\n",
      "-1.9316 -0.0121  0.2750  ...  -0.3298  2.5171  1.1672\n",
      "[kaldi.matrix.SubMatrix of size 142x13]\n",
      "\n",
      "\n"
     ]
    }
   ],
   "source": [
    "with SequentialWaveReader(rspec) as reader, \\\n",
    "             MatrixWriter(wspec) as writer:\n",
    "            \n",
    "    for key, wav in reader:\n",
    "        if wav.duration < opts.min_duration:\n",
    "            continue\n",
    "                    \n",
    "        assert(wav.samp_freq >= sf)\n",
    "        assert(wav.samp_freq % sf == 0)\n",
    "\n",
    "        print(\">>> print(wav.sample_freq)\")\n",
    "        print(wav.samp_freq)\n",
    "        print()\n",
    "        \n",
    "        s = wav.data()\n",
    "        print(\">>> print(s)\")\n",
    "        print(s)\n",
    "        print()\n",
    "        \n",
    "        # downsample to sf [default=8kHz]\n",
    "        s = s[:,::int(wav.samp_freq / sf)]\n",
    "\n",
    "        # mix-down stereo to mono\n",
    "        m = SubVector(mean(s, axis=0))\n",
    "\n",
    "        # compute MFCC features\n",
    "        f = mfcc.compute_features(m, sf, 1.0)\n",
    "\n",
    "        # standardize features\n",
    "        f = SubMatrix(scale(f))\n",
    "        print(\">>> print(f)\")\n",
    "        print(f)\n",
    "        print()\n",
    "        \n",
    "        # write features to archive\n",
    "        writer[key] = f"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
